{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial of implementing BERT using Huggingface Transformers\n",
    "This tutorial introduces how to implement the state-of-art NLP language model, BERT, using the Huggingface Transformer library. This tutorial will walk you through the introduction of BERT, overview of some NLP tasks, specifically GLUE dataset that is used for sentence understanding, followed by the introduction of 🤗Transformer, code examples of training BERT with GLUE dataset built in Tensorflow, and using the pre-trained BERT model to predict some new instances.\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "# BERT Introduction\n",
    "BERT (Bidirectional Encoder Representations from Transformers) is a state-of-art language model for NLP. [BERT](https://github.com/google-research/bert) released with the paper BERT: [Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. Followed by this recent paper published by researchers at Google AI Language (October 2018), it has caused a stir in the Machine Learning and NLP community by presenting the state-of-art results in eleven most popular NLP tasks, including pushing the [GLUE](https://gluebenchmark.com/) score to 80.5% (7.7% point absolute improvement), [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) accuracy to 86.7% (4.6% absolute improvement), [Stanford's Question Answering Dataset](https://rajpurkar.github.io/SQuAD-explorer/) SQuAD v1.1  Test F1 to 93.2 (1.5 point absolute im- provement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).\n",
    "\n",
    "The key innovation of BERT is its bidirectional training of Transformer, a popular attention model, to language modelling. As the paper concluded:\n",
    ">BERT is designed to pre- train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine- tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.\n",
    "\n",
    "The groundbreaking technology of BERT is its ability to do bidirectional fine-tuning approaches for pre-trained representations. The standard language models today are unidirectional, either left-to-right (like OpenAI GPT) or right-to-left. BERT uses the technology called \"Masked Language Model\" (MLM), which wil be briefed in the following section, to give more context about BERT.\n",
    "\n",
    "\n",
    "## Masked Language Model (MLM)\n",
    "\n",
    "Masked Language Model (MLM)'s pre-training objective is inspired by the Cloze task (Taylor, 1953). \n",
    "```The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context```. Unlike left-to- right language model pre-training, the MLM ob- jective enables the representation to fuse the left and the right context, which allows BERT to pretrain a deep bidirectional Transformer. In addition to the masked language model, BERT also use a “next sentence prediction” task that jointly pre- trains text-pair representations (Google, 2018).\n",
    "\n",
    "## BERT Implementation\n",
    "BERT’s model architecture is a multi-layer bidirectional Transformer encoder based on the original implementation de- scribed in Vaswani et al. (2017) and released in the ```tensor2tensor``` library. \n",
    "There are two steps to implement BERT, pre-training and fine-tuning.\n",
    "\n",
    "### Pre-training\n",
    "1. A [CLS] token is inserted at the beginning of the first sentence and a [SEP] token is inserted at the end of each sentence.    \n",
    "2. A sentence embedding indicating Sentence A or Sentence B is added to each token. Sentence embeddings are similar in concept to token embeddings with a vocabulary of 2.    \n",
    "3. A positional embedding is added to each token to indicate its position in the sequence. The concept and implementation of positional embedding are presented in the Transformer paper.\n",
    "\n",
    "![BERT input](input.png)\n",
    "*BERT input representation. The input embeddings are the sum of the token embeddings, the segmenta- tion embeddings and the position embeddings (Google, 2018)*\n",
    "\n",
    "### Fine-tuning\n",
    "Fine-tuning is straightforward since the self-attention mechanism in the Transformer allows BERT to model many downstream tasks whether they involve single text or text pairs by swapping out the appropriate inputs and outputs. BERT uses the self-attention mechanism to unify these two stages, as encoding a concatenated text pair with self-attention effectively includes bidirectional cross attention between two sentences.\n",
    "\n",
    "For each task, simply plug in the task-specific inputs and outputs into BERT and fine-tune all the parameters end-to-end. At the input, sentence A and sentence B from pre-training are analogous to \n",
    "1. sentence pairs in paraphrasing    \n",
    "2. hypothesis-premise pairs in entailment    \n",
    "3. question-passage pairs in question answering    \n",
    "4. a degenerate text-∅ pair in text classification or sequence tagging   \n",
    "\n",
    "At the output, the token rep- resentations are fed into an output layer for token level tasks, such as sequence tagging or question answering, and the \\[CLS\\] representation is fed into an output layer for classification, such as entailment or sentiment analysis.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Huggingface Transformers\n",
    "[Huggingface Transformers](https://github.com/huggingface/transformers) (formerly known as pytorch-transformers / pytorch-pretrained-bert) is a state-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. It is a mighty transformer that not only provides NLP architectures like BERT, but also some other models like XLM, DistillBert, XLNet, CTRL, and many more with over 32+ pretrained models in 100+ languages. To use Huggingface Transformers, you must have the Tensorflow 2.0 and PyTorch set up because it has deep interoperability between them.\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set up\n",
    "If you haven't set up Tensorflow and/or Pytorch, refer to the [Tensorflow Install Instruction Page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available) and [Pytorch Install Instruction Page](https://pytorch.org/get-started/locally/#start-locally) to install them. Or follow the below instruction on installing them. \n",
    "\n",
    "Before installing Tensorflow, make sure the pip and python version are   \n",
    "**Python > 3.4 and pip >= 19.0**\n",
    "\n",
    "    $ python3 --version    \n",
    "$ pip3 --version \n",
    "    (or $ pip --version)\n",
    "    \n",
    "    \n",
    "Install Tensorflow 2.0:\n",
    "\n",
    "    $ pip install --upgrade tensorflow\n",
    "    \n",
    "Verify the install:\n",
    "\n",
    "    $ python -c \"import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))\"\n",
    "    \n",
    "To install Pytorch, run the command on the [Pytorch Install Instruction Page](https://pytorch.org/get-started/locally/#start-locally), under the \"Start Locally\" section and find the section \"Run this Command:\", based on the environment of your computer. To verify the successful installation of Pytorch, run the following python code:\n",
    "\n",
    "```python\n",
    ">>> import torch\n",
    ">>> x = torch.rand(5, 3)\n",
    ">>> print(x)\n",
    "```\n",
    "\n",
    "Result should be something similar to \n",
    "```\n",
    "tensor([[0.3380, 0.3845, 0.3217],  \n",
    "        [0.8337, 0.9050, 0.2650],  \n",
    "        [0.2979, 0.7141, 0.9069],  \n",
    "        [0.1449, 0.1132, 0.1375],  \n",
    "        [0.4675, 0.3947, 0.1426]])\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "When TensorFlow 2.0 and PyTorch has been installed, 🤗Transformers can be installed using pip as follows:  \n",
    "\n",
    "    $ pip install transformers\n",
    "\n",
    "Dataset: Tensorflow dataset. Follow the [instruction of installing Tensorflow Dataset](https://www.tensorflow.org/datasets/overview) or do  \n",
    "    \n",
    "    $ pip install tensorflow-datasets\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "===setting up===\n",
      "PyTorch imported\n",
      "Tensorflow imported\n",
      "tfds imported\n",
      "🤗 imported\n",
      "===setting done===\n"
     ]
    }
   ],
   "source": [
    "# set up\n",
    "print(\"===setting up===\")\n",
    "import torch\n",
    "print(\"PyTorch imported\")\n",
    "import tensorflow as tf\n",
    "print(\"Tensorflow imported\")\n",
    "import tensorflow_datasets as tfds\n",
    "print(\"tfds imported\")\n",
    "from transformers import *\n",
    "print(\"🤗 imported\")\n",
    "import matplotlib.pyplot as plt\n",
    "print(\"===setting done===\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Tensorflow Dataset - GLUE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's import the data. Tensorflow has built in variuos popular machine learning and deep learning datasets, in the area of audio, image, structured, text, translate, and video. \n",
    "\n",
    "GLUE belongs to the text data that is a paraphrasing NLP task that has the function of semantic analysis. It is a sentence or sentence-pair language understanding tasks. \n",
    "\n",
    "This tutorial uses the ```glue/mrpc```, the Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005). It is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are ```semantically equivalent```. In a nutshell, it is a ```paraphrasing task``` to understand two sentence semantically.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```tfds.load``` is a convenience method that's the simplest way to build and load a tf.data.Dataset. tf.data.Dataset is the standard TensorFlow API to build input pipelines. If this data has originally downloaded before, it will reuse the data from previous downloaded location, otherwise, it will take a few minutes to download the dataset. \n",
    "\n",
    "Under the folder where the dataset is downloaded, the files look something similar as these:\n",
    "\n",
    "```markdown\n",
    "dataset_info.json\t\t\tglue-validation.tfrecord-00000-of-00001\n",
    "glue-test.tfrecord-00000-of-00001\tlabel.labels.txt\n",
    "glue-train.tfrecord-00000-of-00001\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:absl:Overwrite dataset info from restored data version.\n",
      "INFO:absl:Reusing dataset glue (/Users/rainy/tensorflow_datasets/glue/mrpc/0.0.2)\n",
      "INFO:absl:Constructing tf.data.Dataset for split None, from /Users/rainy/tensorflow_datasets/glue/mrpc/0.0.2\n",
      "WARNING:absl:Warning: Setting shuffle_files=True because split=TRAIN and shuffle_files=None. This behavior will be deprecated on 2019-08-06, at which point shuffle_files=False will be the default for all splits.\n"
     ]
    }
   ],
   "source": [
    "# load the glue/mrpc data from Tensorflow dataset, with the dataset_info.json file\n",
    "# this might take a few minutes to download the dataset from the source\n",
    "data, data_info = tfds.load('glue/mrpc', with_info = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "FeaturesDict({\n",
       "    'idx': Tensor(shape=(), dtype=tf.int32),\n",
       "    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),\n",
       "    'sentence1': Text(shape=(), dtype=tf.string),\n",
       "    'sentence2': Text(shape=(), dtype=tf.string),\n",
       "})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data inspection\n",
    "data_info.features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_info.features['label'].num_classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'test': <tfds.core.SplitInfo num_examples=1725>,\n",
       " 'train': <tfds.core.SplitInfo num_examples=3668>,\n",
       " 'validation': <tfds.core.SplitInfo num_examples=408>}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_info.splits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# to access three splitted dataset\n",
    "train = data['train']\n",
    "test = data['test']\n",
    "validation = data['validation']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tensorflow dataset uses ```iterator``` to loop the data. To view the example of the dataset, iterator is called."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(201, shape=(), dtype=int32)\n",
      "tf.Tensor(b'Tibco has used the Rendezvous name since 1994 for several of its technology products , according to the Palo Alto , California company .', shape=(), dtype=string)\n",
      "tf.Tensor(b'Tibco has used the Rendezvous name since 1994 for several of its technology products , it said .', shape=(), dtype=string)\n",
      "tf.Tensor(1, shape=(), dtype=int64)\n",
      "=========================\n",
      "tf.Tensor(2977, shape=(), dtype=int32)\n",
      "tf.Tensor(b\"Most of the alleged spammers engaged in fraudulent or deceptive practices , said Brad Smith , Microsoft 's senior VP and general counsel .\", shape=(), dtype=string)\n",
      "tf.Tensor(b'\" Spam knows no borders , \" said Brad Smith , Microsoft \\'s senior vice-president and general counsel .', shape=(), dtype=string)\n",
      "tf.Tensor(0, shape=(), dtype=int64)\n",
      "=========================\n",
      "tf.Tensor(3482, shape=(), dtype=int32)\n",
      "tf.Tensor(b'Yesterday , Taiwan reported 35 new infections , bringing the total number of cases to 418 .', shape=(), dtype=string)\n",
      "tf.Tensor(b'The island reported another 35 probable cases yesterday , taking its total to 418 .', shape=(), dtype=string)\n",
      "tf.Tensor(1, shape=(), dtype=int64)\n",
      "=========================\n"
     ]
    }
   ],
   "source": [
    "# set up the iterator from the training set\n",
    "iterator = train.__iter__()\n",
    "# iterate the glue dataset to get a snippet of the dataset\n",
    "for i in range(3):\n",
    "    next_element = iterator.get_next()\n",
    "    print(next_element['idx'])\n",
    "    print(next_element['sentence1'])\n",
    "    print(next_element['sentence2'])\n",
    "    print(next_element['label'])\n",
    "    print(\"=========================\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(339, shape=(), dtype=int32)\n",
      "tf.Tensor(b'Investment bank Merrill Lynch raised its investment rating on the business software maker Oracle to \" buy \" from \" neutral \" with a 12-month price target of $ 15 .', shape=(), dtype=string)\n",
      "tf.Tensor(b'Merrill Lynch upgraded the business software maker to \" buy \" from \" neutral \" with a 12-month price target of $ 15 .', shape=(), dtype=string)\n",
      "tf.Tensor(-1, shape=(), dtype=int64)\n",
      "=========================\n",
      "tf.Tensor(598, shape=(), dtype=int32)\n",
      "tf.Tensor(b'\" The Leading Economic Index finally points to a recovery , almost a year and a half after the end of the recession , \" Conference Board economist Ken Goldstein said .', shape=(), dtype=string)\n",
      "tf.Tensor(b'Conference Board economist Ken Goldstein said the improved reading \" finally points to a recovery , almost a year and a half after the end of the recession .', shape=(), dtype=string)\n",
      "tf.Tensor(-1, shape=(), dtype=int64)\n",
      "=========================\n",
      "tf.Tensor(300, shape=(), dtype=int32)\n",
      "tf.Tensor(b\"The ruling ``is so wrong that we are extremely confident that it will not withstand our appeal to the Sixth Circuit , ' ' Taubman Centers said in a statement .\", shape=(), dtype=string)\n",
      "tf.Tensor(b'The statement concluded , \" This ruling is so wrong that we are extremely confident that it will not withstand our appeal \" to the 6th U.S. Circuit Court .', shape=(), dtype=string)\n",
      "tf.Tensor(-1, shape=(), dtype=int64)\n",
      "=========================\n"
     ]
    }
   ],
   "source": [
    "# iterator for testing set\n",
    "iterator = test.__iter__()\n",
    "# iterate the glue dataset to get a snippet of the dataset\n",
    "for i in range(3):\n",
    "    next_element = iterator.get_next()\n",
    "    print(next_element['idx'])\n",
    "    print(next_element['sentence1'])\n",
    "    print(next_element['sentence2'])\n",
    "    print(next_element['label'])\n",
    "    print(\"=========================\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To summarize, glue is a NLP task to do the sentence understanding. With 0 meaning sentence 1 and sentence 2 are not the paraphase and 1 meaning that they are paraphrase. The label for testing set is masked as -1. Now, let's look at how to use Huggingface Transformers to build a BERT model to predict whether two sentences are paraphrases!\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# BERT using Huggingface Transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first step is to load the model. Huggingface Transformer has 8 transformer architectures and 30 pretrained weights, including BERT, OpenAIGPT, GPT2, TransfoXLModel, XLNet, XLM. Transformers use a single API to call these language models. The BERT Model name is ```BertModel```,  tokenizer class is ```BertTokenizer```, and pretrained weights shortcut is ```'bert-base-uncased'```."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To understand how each element of the model works in Huggingface Transformers, let's first encode some text to see what does transformers transform. First, set up the tokenizer for the model as ```tokenizer = tokenizer_class.from_pretrained(pretrained_weights)```. Then use ```tokenizer.encode(text)``` to encode text, and some models require adding special tokens (```add_special_tokens = True```). BERT, as introduced above, requires adding \\[CLS\\], \\[SEP\\] to text, where \\[CLS\\] is encoded as 101 and \\[SEP\\] is encoded as 102."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "text encoding without special tokens: [153, 18890, 2233, 2598, 5315, 18922]\n",
      "text encoding after adding special tokens for BERT: tensor([[  101,   153, 18890,  2233,  2598,  5315, 18922,   102]])\n"
     ]
    }
   ],
   "source": [
    "# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', \n",
    "# e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`\n",
    "\n",
    "# encode some text snippets to see how bert tokenizer works\n",
    "tokenizer = BertTokenizer.from_pretrained('bert-base-cased')\n",
    "input_ids = torch.tensor([tokenizer.encode(\"Practical data science requires encoding\", add_special_tokens=True)])  \n",
    "\n",
    "print(\"text encoding without special tokens:\", tokenizer.encode(\"Practical data science requires encoding\", add_special_tokens=False))\n",
    "print(\"text encoding after adding special tokens for BERT:\", input_ids)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a briefing of all bert model classes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.\n",
    "BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,\n",
    "                      BertForSequenceClassification, BertForMultipleChoice, BertForTokenClassification,\n",
    "                      BertForQuestionAnswering]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up BERT Model for Classification\n",
    "\n",
    "Now let's train the BERT on GLUE dataset! \n",
    "\n",
    "First, load tokenizer, model from Huggingface Transformer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load tokenizer, model from pretrained model/vocabulary\n",
    "tokenizer = BertTokenizer.from_pretrained('bert-base-cased')\n",
    "model = TFBertForSequenceClassification.from_pretrained('bert-base-cased') # TF added for showing the usage of tensorflow 2.0 version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<transformers.modeling_tf_bert.TFBertForSequenceClassification at 0x1a4614d940>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to convert our data into something that BERT can understand. \n",
    "\n",
    "BERT expects data as a .tsv file with four columns: id, label, a column with the same alphabet, text. Then we need to convert data to features. Huggingface Transformers has a built-in converter for glue dataset so let's embrace it! The dataset results after conversion are Tensorflow dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare dataset for GLUE as a tf.data.Dataset instance\n",
    "train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')\n",
    "valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')\n",
    "train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)\n",
    "valid_dataset = valid_dataset.batch(64)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, after the model and dataset are set, we need to prepare the training with tensorflow's keras model with optimizer, loss, and learning rate schedule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule \n",
    "optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)\n",
    "loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n",
    "metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')\n",
    "model.compile(optimizer=optimizer, loss=loss, metrics=[metric])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alright... now we can finally train our BERT model!\n",
    "\n",
    "For the below report, the epoches are set to 2 and steps per epoch are set to 115, with the validation steps set to 7. You can change these parameters, but with this setting it cost my computer **50 minutes** to train the model (see below report for detail). To increase the model performance but may at the cost of accuracy, you can decrease the steps per epoch."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```Model performance report for 2 epochs at step size = 115```\n",
    "```markdown\n",
    "Train for 115 steps, validate for 7 steps\n",
    "Epoch 1/2\n",
    "115/115 \\[==============================\\] - 1515s 13s/step - loss: 0.5822 - accuracy: 0.6933 - val_loss: 0.4512 - val_accuracy: 0.7917\n",
    "Epoch 2/2\n",
    "115/115 \\[==============================\\] - 1499s 13s/step - loss: 0.3312 - accuracy: 0.8550 - val_loss: 0.4058 - val_accuracy: 0.8309\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To save some time and re-use the model for future use, you can use ```model.save_pretrained(directory)``` and save the model to a place in your directory. Alternatively, you may import the model anytime using ```model.from_pretrained(directory, from_tf=True)```. This tutorial will not include the pre-trained BERT model simply because it's size is over 400MB and exceed the requirement limit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train for 25 steps, validate for 6 steps\n",
      "Epoch 1/6\n",
      "25/25 [==============================] - 389s 16s/step - loss: 0.4844 - accuracy: 0.8037 - val_loss: 0.5792 - val_accuracy: 0.7318\n",
      "Epoch 2/6\n",
      "25/25 [==============================] - 394s 16s/step - loss: 0.5309 - accuracy: 0.7475 - val_loss: 0.5240 - val_accuracy: 0.7578\n",
      "Epoch 3/6\n",
      "25/25 [==============================] - 375s 15s/step - loss: 0.5382 - accuracy: 0.7425 - val_loss: 0.4910 - val_accuracy: 0.7839\n",
      "Epoch 4/6\n",
      "25/25 [==============================] - 377s 15s/step - loss: 0.4638 - accuracy: 0.7975 - val_loss: 0.4382 - val_accuracy: 0.8047\n",
      "Epoch 5/6\n",
      "25/25 [==============================] - 384s 15s/step - loss: 0.3969 - accuracy: 0.8147 - val_loss: 0.4756 - val_accuracy: 0.7995\n",
      "Epoch 6/6\n",
      "25/25 [==============================] - 391s 16s/step - loss: 0.2941 - accuracy: 0.8875 - val_loss: 0.4644 - val_accuracy: 0.7969\n"
     ]
    }
   ],
   "source": [
    "# Train and evaluate using tf.keras.Model.fit()\n",
    "# **************************IMPORTANT*******************: this takes about half an hour to train. \n",
    "# If you don't mind waiting, you can boil some water, and prepare for a few cups of tea.\n",
    "# Else, please proceed without running this cell.\n",
    "history = model.fit(train_dataset, epochs = 6, steps_per_epoch = 25,\n",
    "                    validation_data=valid_dataset, validation_steps=6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's visualize the result. With 6 epochs, step size = 25, and validation steps = 6 for each epoch, the highest accuracy on training set is 0.89. From the BERT paper, the highest accuracy score of BERT on glue, mrpc dataset is **89.3** (Appendix). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEWCAYAAACnlKo3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nO3dd3hUVfrA8e+bDgQSauhNOkhLxIIFFF1soCuroqKuBV3FtrqrLnZZdf2tZVHW7qqgAhaUVRQVRRcbJAGpUkQgobckBEh/f3/cGx1iQiZlcmcy7+d58mRmbnvPzJ15555z5hxRVYwxxpjqiPA6AGOMMaHLkogxxphqsyRijDGm2iyJGGOMqTZLIsYYY6rNkogxxphqsyRi/CYil4nIR7W9brARkUdEZLeIZHodS0VEZJqI3Od1HBURkREissHrOLwmIpEikisiHb2OJVBCLomIyHwR2SsisV7HEsxE5GL35M0VkYMiUuJzP7c6+1TVV1X19Npet6pE5EQR+VZEskVkj4gsEJHBfmwXJSIqIp0Ps04X4Eagp6q2r72ovSMi3dxy55b5O8/r2PzhT0ISkY4iMktEdrnnxTIRGecu6yYiAftBnIg0FZFXRGSbiOSIyGoRuQ1AVYtVNV5VNwXq+FWI8zgR+cx9z+wUkRkikuSzfJKIFJY5RypNfiGVRNw3/wmAAqPq+NhRdXm8mlLV192TNx44HdhSet997BChUj4RaQrMBh4HmgLtgUlAQS0dohOwQ1V3VSO2oH4OfV9/9+8dr2OqRa8D64GOQHPgMmBHHR17MhAD9AISgXPcWIJNU+AZnHO8M5AHvFRmndfLnCOVJz9VDZk/4B7ga5wPkA/KLGsAPAZsBLKBBUADd9nxwDdAFpABXO4+Ph+4ymcflwMLfO4rcD2wFvjZfexf7j5ygDTgBJ/1I4G/AT8B+9zlHYApwGNl4v0vcHMF5TwOWOSWYxFwnM+y+cCD7vOwD/gEaFHJ8zYMyCzn8UzgL8AyoMB97C6cN8A+YAUwymf9q4D57u0o9/m5BlgH7AUmV3PdSOBJYLd77BucU7PcshwD7KqkvFcBP7rH+Qjo4D7+jRvHfiAXOK/MdiOBg0CJu/xF9/Fz3OciC/gc5yqlwuewnHj6AJ8Be9y4zvNZNgpY4j7fm4C7y2x7IvCdey5kAOPcx6fhfHh95G77LdClguN3q+j59NnXFGCeu68vSp8zn/dPqhvDQuBon2XNgVeAre7z/Y77+AhgA/BXYCewBbi0ktdslXv8n3Dfl0BCmdckF2hVzvZ5QL8K9r3Ffd1Ltz+qkvOk9Hy9AfgZ2AU8AkRUsP8fgbMqWFa6r844CS7X5+8AUFTZeRuoP2AIsNfn/iTglSrvJ5BBBqDQ64DrgGSgEEjyWTYF5wO2Hc6H0nFArPvC7QPGAtHuST/Q3WY+lSeRT4Fm/JqQLnH3EQXcCmwD4txlpR8mPQEBBrjrDnFP5Ah3vRbuCZRUThmbuSfROPcYY937zX1i/gnogZM45wOPVPK8DaPiJJKG822+tHznA21wrlIvck/2JJ+TfH6ZN8f7OG/0zjgfkiOqse4EYLn72jXD+RDTCsrS1H0+/oPzoZ9YZvkYYLX7GkQB9wH/K/uGPsxzNQLY4HO/t/scnOyeP38D1gDRFT2HZfbXGNgMXOoePxknWfZ0l58M9HOf7wE4H1hnucu64Jy757vbtuDXc3eau26KG9cMYFoFZfIniWQDQ3HeM1N8XrsW7rKxbgyXuPE3dZfPBd5wX5cY4ESf57EIuNeNbxRO8m5SQQxnA11x3jcn4ySO/uW9JhVsPx/4H3ABZT58yyu/n+fJZ265OuN89lxewbFfwXnfXw50L7OswnPOfc2mVhZPOdtF4nyhqejvNj8/T2/j0M+7Se72e3Dej9f4tR9/VgqGP5xvQ4W437pxMvYt7u0I96QbUM52dwKzDnPiVZZETq4krr2lx3VPgtEVrLcKONW9PQGYU8F644CFZR77lkOvnu7yWXYd8HElMQ6j4iRS4bdDd53lwJnu7fISwzE+675begJXcd2vgCt9lo3k8B96fYFXcT6cC4H3gJbusk+By3zWjQLycRJUdZLI/cAbPvcjcL44HO/PcwhcDHxR5rGXgIkVrP808H/u7buBtypYbxrwrM/9UcDyCtbt5pa77IdNd599TfNZPwHnm38b4I/AN2X2twgnmXTASRQJFTyPuUCkz2N7gJTDnW8+634AXF/ea1LB+s2AR4GVbuzpQLJv+cus7895MsJn+Y3A3AqO3RDnCj7dfT7WAqeVOfc7l9lmovs8xlUWjz/PV1X/gEE4n12+tRx93dc8Eufzdjvwh8r2FUptIpcBn+ivddVvuI+B820pDucbelkdKnjcXxm+d0TkVhFZ5TbeZeG84Vr4caxXcd54uP+nVrBeW5wqOV8bcU7uUtt8bh8AftPGUQVly3e5iPwgIllu+Xrxa/nKU5VYKlq3bZk4DompLFVdoaqXqWo7oD/O1ebj7uJOwBSf+HfhfKhUt5H8kNdDVUtwEofv63G4eDsBQ0vjcWO6AOfNiogc63YW2Ski2TjJ15/zCap4HqhqYpm/teWVQVWzca4+2nL487EDTtVidgWH3KWqxf7EKCJnicj3bqNvFnAahz/vypZtj6r+VVX7AEk41Y+zDrOJP+eJ7+u6Eee5KO/YB1R1kqoOxql5eBd4R0QSyltfRM7G+fJ3jqrmVSGeWiEiPYAPcZL0Nz7lWKGqW9XpDLAAeArnCumwQiKJiEgDnEv6k9weENuAW4ABIlJaBZAHHFHO5hkVPA7O5XVDn/uty1lHfeI4AbjdjaWpqibivNnEj2NNA0a78fbG+fZcni04J5SvjjjfugPBt3xdcRre/oRTfZaIc8UnFWxbW7Zy6Julg78bquoq4DWcKiFwXoMry3xYNlDV7/EpaxUc8nqISIQbq+/rcbj9ZgDzysQTr6oT3OXTgXdwqmASgBfx73yqbb885+6HXwJO2Q93PmYALUSkSU0O7L6/3wYexqk6TcRp6yt9Hqr0uqnqTpz20Q5uWcrb/nDnSSnf87AjznNR2bGz3XLE41SDHUJEegMvA2NU1fcc8iee0n2Udhuu6O+vFcXn9j78DLhXVd+orDj48d4PiSSC07BZjNNAOdD9641TB3qp++3wZeBxEWnrPsnHut2AXwdGiMj5bhfP5iIy0N3vEuD3ItJQRLoBV1YSR2Ocy9WdQJSI3AP4voFeBB4Uke7i6C8izQFUNRPn8nUqTuPjwQqOMQfoISIXufFe4Jb7A3+frBqIxzlxdgIiIlfhXIkE2kzgZve1a4rTtlQuEekjIn8WkXbu/Y7AhTiNzwDPAhPdNysikigiY8DpbolTn9+1irGNEpFhIhLtxrYP+M2buwKzgb7u6xnt/g0RkZ7u8sbAHlXNE5Fj3LKUmgaMFJHz3HOhhfslJBDO9nnPTMKp1t2Kc971FZEL3BguwqkemqOqGTgfSFPc5zlaRE6sxrFjcdpTdgLFInIWcIrP8u04yapxRTsQkUdFpK/73m+C80XoR/dDfQeg7pekUhWeJz7+6j7eEac6a0YFx75XRFJEJEZE4tx19+BUa/mul4jTLni7qn5bZjf+xAMc0m24or9HK4izA07HkMdV9YVylp/jHldE5Gicavf3y9uXr1BJIpcB/1HVTaq6rfQPp/74YnG6Vt6G07i1COcF/AdOQ/Ym4AycRvA9OImj9I34BE7X0O041U2vVxLHXJxeE2twLm/zOPSS93GcD51PcHpvvYTT+F3qVeBIKq7KQlV3A2e58e7G6d1yllajy2lVqepSnB4/C3GuDnrh/4dlTTyD09azDKeR+kMq7rK7DzgWWCQi+3F6XC3BeZ5Q1bdwXoe3RCQHWAr8zmf7e4E33GqD31cWmKquwDn/nsH5kBuJ02Ot0J+CuR9iv8OpwtyKUwX1MM4HJzgfdg+LyD6cRvuZPtv+jNPgfDvOuZuOc/5USznfWG/0WTwNJ3nswqkiHOfGsBOnveV2nPPxFpzzcY+7XWkV7Rqc99ENVY1LVbPc/c7CKecYfL40qepynKu1De7r1qqc3cTjfOBl41QBtsX58omq7sN5zr93t0/x4zwBpwflEmCxG9srhynGqzjPzxacNsgzVfVAmXVSgO7AZJ/XIMuN0Z94amo8ztXRpLLHd13Erz0zXwUmqWpln4mI26Bi6oD7LW0aTiNbidfxBCtx6oyfVNW6qsoJayIyDVinqvd5HUswcL+UFuJ0md7gcThBL1SuREKeWxVyE85vDyyB+BCRRiIy0q2KaI/ze6DDNYoaY4KEJZE64NZzZuH0yHnS43CCkQB/x6mKSMO5lL/f04iMMX6x6ixjjDHVZlcixhhjqi2oB4yrihYtWmjnzp2rvf3+/ftp1KhR7QUUAsKtzOFWXrAyh4ualDktLW2Xqras7rHrTRLp3Lkzqamp1d5+/vz5DBs2rPYCCgHhVuZwKy9YmcNFTcosImVHJKgSq84yxhhTbZZEjDHGVJslEWOMMdVmScQYY0y1WRIxxhhTbZZEjDEmhO3IyeOh7w+yY19e5SsHgCURY4wJYZPnrWXt3hImz1vnyfEtiRhjTIjakZPHjNQMFHg7NcOTqxFLIsYYE6IenfsjhcXO+IfFqp5cjVgSMcaYELQjJ49303+dYbewWD25GrEkYowxIWjirGWUlBmE3YurEUsixhgTYgqKSvhqzW9nzC4sVtI37q3TWOrNAIzGGBMuXlrwM/nFJbx4aQoj+iR5OuikXYkYY0wIydx7gMnz1nJanyRG9EnyOhxLIsYYE0rum70SgHtH9fU4EkdAk4iIjBSR1SKyTkTuKGd5RxH5QkQWi8hSETnDfTxaRF4VkWUiskpE7gxknMYYEwo+Xbmdz1Zt5+YR3WmX2MDrcIAAJhERiQSmAKcDfYCxItKnzGp3ATNVdRBwIfBv9/E/ALGqeiSQDFwjIp0DFasxxgS7AwVF3Dd7BT2TGnPF8V28DucXgbwSGQKsU9X1qloATAdGl1lHgSbu7QRgi8/jjUQkCmgAFAA5AYzVGGOC2uR569icdZBJ5/YjOjJ4WiJEVStfqzo7FhkDjFTVq9z744CjVXWCzzptgE+ApkAjYISqpolINDAVOAVoCNyiqs+Xc4zxwHiApKSk5OnTp1c73tzcXOLj46u9fSgKtzKHW3nBylxfbN5Xwj3fHOS4tlFceWTsb5bXpMzDhw9PU9WU6sYWyC6+Us5jZTPWWOAVVX1MRI4FpopIP5yrmGKgLU6C+Z+IfKaq6w/ZmZNYngdISUnRmnRxs3mZ679wKy9YmesDVeWC57+jcYMi/nXFMJo1ivnNOvW1i28m0MHnfnt+ra4qdSUwE0BVvwXigBbARcDHqlqoqjuAr4FqZ0pjjAlV76RvZuHPe7hjZK9yE4jXAplEFgHdRaSLiMTgNJzPLrPOJpwqK0SkN04S2ek+frI4GgHHAD8GMFZjjAk6WQcKeGjOKgZ3TOT8lA6Vb+CBgCURVS0CJgBzgVU4vbBWiMgDIjLKXe1W4GoR+QF4E7hcnUaaKUA8sBwnGf1HVZcGKlZjjAlG//h4NdkHC/n7uUcSEVFeC4H3AjrsiarOAeaUeewen9srgaHlbJeL083XGGPCUvqmvby5cBNXHd+F3m2aVL6BR4Knn5gxxhgAiopLmDhrOa2bxHHzqT28DuewLIkYY0yQeeWbDazamsO9Z/chPja4x8m1JGKMMUFka/ZBnvh0DcN6tmRkv9Zeh1MpSyLGGBNEHvxgJUUlygOj+iESnI3pviyJGGNMkJi/egdzlm3jhpO70bF5Q6/D8YslEWOMCQJ5hcXc8/4KurZsxNUndvU6HL8Fd4uNMcaEiX9/sY5New7wxlVHExsV6XU4frMrEWOM8dj6nbk8++V6zhnYluO6tfA6nCqxJGKMMR5SVe5+fzmx0RFMPLPslEvBz5KIMcZ4aPYPW/h63W7++ruetGz822Heg50lEWOM8UhOXiGTPlxF//YJXHR0J6/DqRZrWDfGGI88Nnc1u3Pzefmyo4gM0gEWK2NXIsYY44FlmdlM/W4j447pxJHtE7wOp9osiRhjTB0rLlEmvreM5vGx3Pq7nl6HUyOWRIwxpo698f1GlmZmc9eZvWkSF+11ODViScQYY+rQjn15PDp3NUO7NWfUgLZeh1NjlkSMMaYOPfThKvILS3hwdGgMsFgZSyLGGFNHvl63i/eWbOHak7rStWW81+HUCksixhhTB/KLirn7veV0bNaQ64Z38zqcWhPQJCIiI0VktYisE5E7ylneUUS+EJHFIrJURM7wWdZfRL4VkRUiskxE4gIZqzHGBNLzX65n/a79PDC6L3HRoTPAYmUC9mNDEYkEpgCnApnAIhGZraorfVa7C5ipqs+ISB9gDtBZRKKAacA4Vf1BRJoDhYGK1RhjAmnT7gM8/cU6zjyyDcN6tvI6nFoVyCuRIcA6VV2vqgXAdGB0mXUUaOLeTgC2uLdPA5aq6g8AqrpbVYsDGKsxxgSEqnLP7OVERQh3nxV6AyxWRlQ1MDsWGQOMVNWr3PvjgKNVdYLPOm2AT4CmQCNghKqmicjNQDLQCmgJTFfVR8s5xnhgPEBSUlLy9OnTqx1vbm4u8fH1o6HLX+FW5nArL1iZg8GibUVMWZLP2F4x/K5zYH4TUpMyDx8+PE1VU6p77ECOnVVe37WyGWss8IqqPiYixwJTRaSfG9fxwFHAAWCeiKSp6rxDdqb6PPA8QEpKig4bNqzawc6fP5+abB+Kwq3M4VZesDJ7LTe/iDse+5I+bZrw4LihREUGpvLHyzIHsjorE+jgc789v1ZXlboSmAmgqt8CcUALd9svVXWXqh7AaSsZHMBYjTGm1j356Rq278tj0rn9ApZAvBbIUi0CuotIFxGJAS4EZpdZZxNwCoCI9MZJIjuBuUB/EWnoNrKfBKzEGGNCxKqtOfznmw1ceFRHBnds6nU4AROw6ixVLRKRCTgJIRJ4WVVXiMgDQKqqzgZuBV4QkVtwqrouV6eRZq+IPI6TiBSYo6ofBipWY4ypTSUlysRZy0hsEM3tI0N7gMXKBHQ+EVWdg1MV5fvYPT63VwJDK9h2Gk43X2OMCSkzUzNI35TFP/8wgMSGMV6HE1D1s5LOGGM8smd/AY98/CNDujTjvMHtvA4n4CyJGGNMLXp4zipy84qYdE79GGCxMpZEjDGmlizasIe30jK56oSu9Ehq7HU4dcKSiDHG1ILC4hImzlpGu8QG3HhK/RlgsTIBbVg3xphw8dKCn1mzPZcXLk2hYUz4fLTalYgxxtRQ5t4D/OuztYzoncSpfZK8DqdOWRIxxpgauv+/zm+h7xtV/wZYrIwlEWOMqYHPVm7n05XbuWlEd9o3beh1OHXOkogxxlTTgYIi7p29gh5J8Vx5fBevw/FE+LT+GGNMLXvq83VszjrIzGuOJbqeDrBYmfAstTHG1NDa7ft44av1jEluz5AuzbwOxzOWRIwxpopUlbveW06j2CjuPL2X1+F46rBJRBwdDreOMcaEm3fTN/P9z3u44/ReNI+P9TocTx02ibjDsr9XR7EYY0zQyzpQwENzVjG4YyIXpNh3bH+qs74TkaMCHokxxoSAR+euJutgIZPOOZKIiPo/wGJl/OmdNRy4VkQ2APtx5k5XVe0fyMCMMSbYLN60lzcXbuKKoV3o07aJ1+EEBX+SyOkBj8IYY4JcUXEJE2ctJ6lxHLec2sPrcIJGpdVZqroR6ACc7N4+4M92xhhTn7z27UZWbs3hnrP7EB9rP7ErVWkyEJF7gduBO92HovFz2loRGSkiq0VknYjcUc7yjiLyhYgsFpGlInJGOctzReQ2f45njDGBsD0nj8c/XcNJPVpyer/WXocTVPy5ojgXGIXTHoKqbgEqnW1FRCKBKTjVYX2AsSJSdnSyu4CZqjoIuBD4d5nlTwAf+RGjMcYEzAMfrKSwuIQHRvcNi9kKq8KfJFLgdvVVABFp5Oe+hwDrVHW9qhYA04HRZdZRoLR1KgHYUrpARM4B1gMr/DyeMcbUui/X7OTDpVu5fng3OjX39+MvfIiTHw6zglOV1B04FXgYuAJ4Q1WfqmS7McBIVb3KvT8OOFpVJ/is0wb4BGgKNAJGqGqam6g+c495G5Crqv8s5xjjgfEASUlJydOnT/er0OXJzc0lPj6+2tuHonArc7iVF6zMNVVQrNz19UEigAePb0B0kHbprUmZhw8fnqaqKdU9dqWtQ6r6TxE5FcgBegD3qOqnfuy7vGe7bMYaC7yiqo+JyLHAVBHpB9wPPKGquYe7dFTV54HnAVJSUnTYsGF+hFW++fPnU5PtQ1G4lTncygtW5pp6/NM17DiwljeuOprjurWolX0Ggpevs79dDJYBDXCSwDI/t8nE6dVVqj0+1VWuK4GRAKr6rYjEAS2Ao4ExIvIokAiUiEieqj7t57GNMaZG1u/M5dn5PzF6YNugTiBe86d31lXAQuD3wBicX7Bf4ce+FwHdRaSLiMTgNJzPLrPOJuAU9zi9gThgp6qeoKqdVbUz8CTwkCUQY0xdUVXueX8FsdERTDyzt9fhBDV/rkT+AgxS1d0AItIc+AZ4+XAbqWqRiEwA5gKRwMuqukJEHgBSVXU2cCvwgojcgnOVc7lW1khjjDEB9t+lW1mwbhcPjO5Lq8ZxXocT1PxJIpnAPp/7+4AMf3auqnOAOWUeu8fn9kpgaCX7uM+fYxljTG3IySvkwQ9W0r99Ahcf3cnrcIJehUlERP7s3twMfC8i7+NcLYzGqd4yxph65/FP1rArN5+XLkshMkh7YwWTw12JlP6g8Cf3r9T7gQvHGGO8s3xzNq99u4Fxx3Sif/tEr8MJCRUmEVW9vy4DMcYYLxWXKBNnLaNZo1huPa2n1+GEjErbREQkBZgIdPJd34aCN8bUJ28s3MQPmdn868KBJDSI9jqckOFPw/rrOD20lgElgQ3HGGPq3s59+Tz68Y8M7dacUQPaeh1OSPEniex0u+MaY0y99NCcVeQXlvDA6H42wGIV+ZNE7hWRF4F5QH7pg6r6bsCiMsaYOvLNT7uYtXgzN5zcjSNahtc4Y7XBnyTyR6AXzjwipdVZClgSMcaEtPyiYu56bzkdmzXk+uHdvA4nJPmTRAao6pEBj8QYY+rYC1+tZ/3O/fznj0cRFx3pdTghyZ/5RL4rZzIpY4wJaZt2H+Cpz9dxxpGtGd6zldfhhCx/rkSOBy4TkZ9x2kQEUOvia4wJVarKvbOXExUh3HNWX6/DCWn+JJGRAY/CGGPq0NwV2/hi9U7uOrM3rRNsgMWa8CeJ2Ki6xph6Y39+Eff/dyW92zTh8uM6ex1OyPMniXyIk0gEZ76PLsBqwK4BjTEh58nP1rA1O4+nLxpMVKQ/zcLmcPyZHveQnlkiMhi4JmARGWNMgKzamsPLX29g7JAOJHdq6nU49UKV07CqpgNHBSAWY4wJmJIS5a73lpPQIJrbR/byOpx6w58BGP/sczcCGAzsDFhExhgTAG+lZZC2cS//N6Y/iQ1jvA6n3vCnTaSxz+0inDaSdwITjjHG1L49+wt4+KMfGdK5GWOS23sdTr3iT5tItecVEZGRwL9w5lh/UVUfKbO8I/AqkOiuc4eqzhGRU4FHgBigAPiLqn5e3TiMMeHtkY9WkZtXxKRzbYDF2uZPdVYP4DagM4fOJ3JyJdtFAlOAU3HmaV8kIrPdedVL3QXMVNVn3F/Fz3GPsws4W1W3iEg/YC7QrgrlMsYYAFI37GFmaibXnNSVHkmNK9/AVIk/1VlvAc8CLwLFVdj3EGCdqq4HEJHpOPOz+yYRBZq4txOALQCquthnnRVAnIjEqmo+xhjjp8LiEibOWk67xAbcdEp3r8Opl0T18L8lFJE0VU2u8o5FxgAjVfUq9/444GhVneCzThvgE6Ap0AgYoapp5eznWlUdUc4xxgPjAZKSkpKnT59e1TB/kZubS3x8eA0DHW5lDrfygpX5o58LmbG6gBsHxTI4yZ/vzKGpJq/z8OHD01Q1pdoHV9XD/gH3AdcBbYBmpX9+bPcHnHaQ0vvjgKfKrPNn4Fb39rE4VykRPsv7Aj8BR1R2vOTkZK2JL774okbbh6JwK3O4lVc1vMu8ee8B7X33R3rlKwu9DagO1OR1BlK1ks/Xw/35k5ovc///xTf3AF0r2S4T6OBzvz1udZWPK3HH5lLVb0UkDmgB7BCR9sAs4FJV/cmPOI0x5hf3/3cFJarce7YNrhFI/vTO6lLNfS8CuotIF2AzcCFwUZl1NgGnAK+ISG+cYVV2ikgiTlfiO1X162oe3xgTpuat2s7cFdu5fWQvOjRr6HU49VrAKglVtUhEJuD0rIoEXlbVFSLyAM7l02zgVuAFEbkF5+rmclVVd7tuwN0icre7y9NUdUeg4jXGhL4dOXn8/buD5OpyureK58rjq/sd2PgroC1NqjoHp9uu72P3+NxeCQwtZ7tJwKRAxmaMqX8mz1vL2qwSII8Z448hJsoGWAw0e4aNMfXCjpw8ZqZmAhAp0KVlI48jCg+VJhEReUdEzhQRSzjGmKBTUFTCR8u2cs6UrykoLgEgIkKYPG+dx5GFB3+qs54B/ghMFpG3gFdU9cfAhmWMMYe3dvs+ZizKYNbizezeX3DIssJi5e3UDG48pRutGtvMhYHkT++sz4DPRCQBGAt8KiIZwAvANFUtDHCMxhgDwL68Qj5YupUZizJYkpFFVIQwoncS+UXFLFi3i8LiX388XazK5HnrmHROPw8jrv/8algXkebAJTg/GFwMvA4cj/MbkmGBCs4YY1SVRRv2MjM1gw+XbuVgYTHdW8Vz15m9OXdQO5rHx3LGv/53SAIB52okfeNej6IOH/4MwPgu0AuYijMo4lZ30QwRSQ1kcMaY8LUjJ4930jfzVmoG63ftp1FMJOcMassfUjowqEPiIaPxzrnphF9uz58/n2HDhnkQcXjy50rkaa1gGHatyXgrxhhTRmFxCV/8uIOZqRl8sXonxSXKUZ2b8qdhR3Bm/zY0jKm/41+FKn9ekd4ikq6qWQAi0hQYq6r/Dmxoxphw8dPOXGYuyuCd9M3sys2nZeNYrj6hK8/6e+IAAB8SSURBVOentKdry/AaQDLU+JNErlbVKaV3VHWviFwNWBIxxlTb/vwiPly2lZmLMkjduJfICOHkXq04P6UDw3u2JCrSflUQCvxJIhEiIu5oj6WTTdkExcaYKlNV0jdlMXNRBh8s3cL+gmK6tmjEHaf34veD21l33BDkTxKZC8wUkWdxxre6Fvg4oFEZY+qVXbn5vJueyczUTNbtyKVhTCRnHtmGC47qQHKnpjZlbQjzJ4ncDlwD/AkQnEmkXgxkUMaY0FdUXMJXa3cyY1EG81btoKhEGdwxkX+cdyRn9m9LfKw1ktcH/vzYsATnV+vPBD4cY0yo27BrPzNTM3gnPZPtOfk0bxTDH4d25vyUDnS3Oc7rHX9+J9IdeBjogzPfBwCqWtmkVCFjR04eD31/kD7JeVYna0w1HCwoZs6yrcxMzeD7n/cQITCsZyvuH9WBk3u1stF06zF/rif/A9wLPAEMxxlHq15VYE6et5a1e0tsiARjqkBV+SEzm5mpGfx3yRb25RfRuXlD/vK7npw3uD2tE+wLWTjwJ4k0UNV5bg+tjcB9IvI/nMQS8rZnH2T6ogwUbMA2Y/ywZ38BsxZvZuaiDFZv30dcdARnHNmG81M6cHSXZtZIHmb8SSJ57jDwa90ZBzcDrQIbVt2Z9OEqikqcMXdswDZjyldcovxv7U7eSs3kk5XbKCxWBnRI5O/n9uPsAW1pEhftdYjGI/4kkZuBhsCNwIM4VVqXBTKourIjJ49PVm7/5b4NH23MoTL2HOCt1AzeTstkS3YeTRtGM+6Yzpx/VHt6tW7idXgmCBw2ibg/LDxfVf8C5OK0h/hNREYC/8KZY/1FVX2kzPKOwKtAorvOHe6UuojIncCVQDFwo6rOrcqx/TF53lpKtMzInyV2NWLCW15hMXNXbGPGogy++Wk3InBi95ZMPLMPI/q0IjYq0usQTRA5bBJR1WIRSfb9xbq/3AQ0BTgVyAQWichsd171UncBM1X1GRHpgzMfe2f39oVAX6AtznwmPVS1uCoxVCZ9U9Zvho8uLlG++2lXbR7GmJCwfHM2MxZl8P6SzeTkFdG+aQP+fGoPxiS3p21iA6/DM0HKn+qsxcD77qyG+0sfVNV3K9luCLBOVdcDiMh0YDTgm0QUKL0mTgC2uLdHA9NVNR/4WUTWufv71o94/VZ2+Ogj+g/h7KcXEBkRwf78IhrZj6FMPZd1oID3l2xhxqIMVm7NISYqgtP7teaClA4c07U5ERHWSG4Oz59PyWbAbuBkn8cUqCyJtAMyfO5nAkeXWec+4BMRuQFoBIzw2fa7Mtu28yPWGunQrCFPjR3EZS8v5K/vLOXpsYOsp4mpd0pKlG9+2s2M1AzmrthGQVEJ/do14cHRfRk1oB0JDa2R3PjPn1+sV6kdxEd5n75lq8TG4szZ/piIHAtMFZF+fm6LiIwHxgMkJSUxf/78aoYKubm5v2x/Xvdo3lq6lfj83Zzepf6+oXzLHA7CrbxZeSU8nX6ArPzPSYyNYPfBEv63uYj/ZRaxO09pFA0ntI3ixPYxdGpSDPkbWLxwg9dh11i4vc7gbZn9+cX6fyjnA1xVr6hk00ygg8/99vxaXVXqSmCku79vRSQOaOHntqjq88DzACkpKVqT2cx8Z0M76SRl/xvpvLV8G2cfP4jju7eo9n6DWbjNABdu5b1r1jJ+ytnE2xkNEREWrNuFKhzfrQXnH9WB0/okERdd/xrJw+11Bm/L7E911gc+t+OAcynnA70ci4DuItIF57clFwIXlVlnE3AK8IqI9Hb3vxOYDbwhIo/jNKx3Bxb6ccxaISI8OmYAa7fncsOb6cyecDwdmjWsq8MbU2M7cvKYker8iHbBut20bhLHjSd3Z0xyezuXTa2qdEAbVX3H5+914Hyg0v6vqloETMAZSn4VTi+sFSLygIiMcle7FbhaRH4A3gQuV8cKYCZOI/zHwPW13TOrMvGxUTw3LpmiYuXaaWnkFdbp4Y2pkfv+u+KXnodREcIpvVtxy6k9LIGYWledUdG6Ax39WVFV56hqD1U9QlX/7j52j6rOdm+vVNWhqjpAVQeq6ic+2/7d3a6nqn5UjThrrGvLeJ68cCArtuTwt1nLqGIvZ2M88ePWHOYs2/bL/aIS5Z20THbsy/MwKlNfVZpERGSfiOSU/gH/xZljJCyc0juJm0d05930zbz27UavwzHmsAqLS7j05d/W/JYO6WNMbfOnd1bYTwBw48ndWb45mwc/WEnvNk0Y0qWZ1yEZU65JH6xkx7783zxeWKykb9zrQUSmvvOnd9a5wOeqmu3eTwSGqep7gQ4uWERECI9fMJDRT3/Nda+n8cENJ9gw1yboTF+4iVe/3cjVJ3Rh4pl9gPDsqWTqlj9tIveWJhAAVc2ingwDXxVN4qJ5flwyBwuKuXZaGvlF1tBugkfaxj3c/f5yTujegttH9vI6HBNG/Eki5a0TluOBdE9qzD//MIAlGVncN3tl5RsYUwe2Zh/kmqnptEtswNNjBxMVabMImrrjz9mWKiKPi8gRItJVRJ4A0gIdWLA6/cg2/GnYEby5cBNvLtzkdTgmzOUVFnPN1DQOFhTx/KUpNmSJqXP+JJEbgAJgBs5vNw4C1wcyqGB322k9OaF7C+59fwWLN1ljpfGGqnLnu8tYmpnNExcMpEdS2PeBMR7w58eG+1X1DlVNcf/+pqr7K9uuPouMECZfOIhWTWL507R0dpbTG8aYQHtpwc/MWryZP5/ag9P6tvY6HBOm/PmdyKduj6zS+01FpNYniAo1TRvF8Ny4ZLIOFnD96+kUFpd4HZIJI1+t2clDc1Zxer/WTBjezetwTBjzpzqrhdsjCwBV3Us9mmO9Jvq2TeAf5/Vn4YY9/P3DVV6HY8LEhl37mfBGOj3cjh4254fxkj9JpMSdxhYAEelEOaP6hqvRA9txxdAuvPLNBt5Nz/Q6HFPP7csr5KrXUomIEF64NMUmTjOe8+cMnAgsEJEv3fsn4s7hYRx3ntGLFVuyufPdZfRIaky/dgleh2TqoZIS5ZYZP/Dzrv1MvWKIDaZogoI/DesfA4P5tXdWsqqGfZuIr+jICKZcPJhmjWK4Zmoae/YXeB2SqYeenLeWz1Zt564ze3Nct/o5x40JPf7+KqkY2AFkA31E5MTAhRSaWsTH8uwlyezMzefGNxdTZA3tphZ9tGwrk+et5Q/J7bn8uM5eh2PML/zpnXUV8BXOvCD3u//vC2xYoWlAh0Qmje7HgnW7+L9PVnsdjqknVm3N4da3fmBQx0QmndsPEWtIN8HDnyuRm4CjgI2qOhwYhDP7oCnH+Ud14OKjO/Lcl+v5YKk/E0AaU7E9+wu4+rVUGsdF8dwlycRG1b/pbE1o8yeJ5KlqHoCIxKrqj0DPwIYV2u49uy+DOyby17eXsnrbPq/DMSGqsLiE619PZ8e+fJ4bl0KrJjZytAk+/iSRTPfHhu8Bn4rI+/g3x3rYiomK4JlLkmkUG8U1U1PJPljodUgmBP39w1V8u343D517JAM7JFa+gTEe8Kd31rmqmqWq9wF3Ay8B5wQ6sFCX1CSOZy4eTObeg9w8fTElJfbTGuO/makZvPLNBq4Y2oUxye29DseYClVpzGhV/VJVZ6uqX31YRWSkiKwWkXUickc5y58QkSXu3xoRyfJZ9qiIrBCRVSIyWUKwNTGlczPuPbsPX6zeyZPz1nodjgkR6Zv2ctes5Qzt1py/nWFzg5jgFrCfu4pIJDAFOBXIBBaJyGxV/WUiDlW9xWf9G3Aa7RGR44ChQH938QLgJGB+oOINlEuO6cQPmdlMnreWfm2b2EB55rC25+Rx7dQ0WifE2dwgJiQE8gwdAqxT1fXulct0YPRh1h8LvOneViAOiAFigWhgewBjDRgRYdI5/ejfPoE/z/yBn3bmeh2SCVJ5hcWMn5pGbn4RL1yaQtNGMV6HZEylRDUwdfUiMgYYqapXuffHAUer6oRy1u0EfAe0V9Vi97F/AlcBAjytqhPL2W487hAsSUlJydOnT692vLm5ucTHx1d7+8rsPljCfd8cpHGMcPexDWgQ5X3tXKDLHGyCubyqyovLCvh6SxE3DIolOal2KgmCucyBYmWumuHDh6epakp1jx3I0dvK+5SsKGNdCLztk0C6Ab2B0hbFT0XkRFX96pCdqT4PPA+QkpKiw4YNq3aw8+fPpybb+6NNj12Me2kh729twjOXDPb8R2N1UeZgEszlfWnBz3y9ZSU3ndKdW07tUWv7DeYyB4qVuW4FsjorE+jgc789FXcNvpBfq7IAzgW+U9VcVc0FPgKOCUiUdei4I1pw5+m9+HjFNv49/yevwzFBYsHaXTw0ZxWn9UniplO6ex2OMVUSyCSyCOguIl1EJAYnUcwuu5KI9ASaAt/6PLwJOElEokQkGqdRvV5M2HHl8V04e0Bb/vnJauav3uF1OMZjG3fv5/o30jmiZSMev2CgzQ1iQk7AkoiqFgETcMbaWgXMVNUVIvKAiIzyWXUsMF0PbZx5G/gJWAb8APygqv8NVKx1SUT4x3lH0jOpMTdNX8Km3Qe8Dsl4JDe/iKtfSwXghUtTiLe5QUwICuhZq6pzgDllHrunzP37ytmuGLgmkLF5qWFMFM+PS+Hspxcwfmoq7153HA1j7AMknJSUKLfOXMK6Hbm8dsXRdGreyOuQjKkW64TukY7NG/KvCweyevs+7nhnGYHqJWeC0+TP1zJ3xXYmntmH47vb3CAmdFkS8dCwnq247bSezP5hCy8t+NnrcEwd+Xj5Np78bC3nDW7PFUM7ex2OMTViScRj1w07gpF9W/PwRz/yzU+7vA7HBNjqbfu4deYSBrRP4O82N4ipByyJeExE+Of5A+jSohET3ljM5qyDXodkAiTrgDM3SMPYKJ4bl0JctM0NYkKfJZEgEB8bxXPjkiksKuFP09LIKyz2OiRTy4qKS5jwxmK2Zefx7CXJtE6wuUFM/WBJJEgc0TKexy8YyNLMbO5+b7k1tNczD3/0IwvW7WLSuf1I7tTU63CMqTWWRILIqX2SuPHkbryVlsm07zd5HY6pJW+nZfLSgp+5/LjOnJ/SofINjAkhlkSCzM0jejC8Z0vun72C1A17vA7H1NCSjCz+NmsZx3ZtzsQze3sdjjG1zpJIkImIEJ68cBDtmzbgT6+nsz0nz+uQTDXtyMnjmqmptGocy5SLBxNtc4OYesjO6iCU0CCa58alsD+/iOteT6egqMTrkEwV5RcVc820NHIOOnODNLO5QUw9ZUkkSPVs3ZhHx/QnbeNeHvhghdfhmCpQVe6atZzFm7J4/PwB9G7TxOuQjAkYG7ApiJ3Vvy3LNmfz3Jfr6d8ukfOPskbZUPDqNxt4Ky2TG0/uxulHtvE6HGMCyq5EgtxfTuvJ8d1acNd7y/khI8vrcEwlvlm3iwc/XMWI3kncPKL2JpcyJlhZEglyUZERPDV2EC0bx3LttDR25eZ7HZKpQMaeA1z3RjpdWjTiiQsG2NwgJixYEgkBTRvF8Ny4ZPbsL2DCG+kUFVtDe7DZ784NUlKivHBpCo3jor0OyZg6YUkkRPRrl8DDvz+S79bv4eGPfvQ6HONDVbntrR9Ys30fT100mC4tbG4QEz6sYT2E/H5we5ZmZvPSgp/p3z6B0QPbeR2SAZ7+fB0fLd/GxDN6c1KPll6HY0ydsiuREDPxzN4M6dKM299Zyoot2V6HE/Y+WbGNxz5dw7mD2nHVCV28DseYOhfQJCIiI0VktYisE5E7yln+hIgscf/WiEiWz7KOIvKJiKwSkZUi0jmQsYaK6MgIplw0mMQGMVw7LY2sAwVehxS21mzfxy0zltC/vVPVaHODmHAUsCQiIpHAFOB0oA8wVkT6+K6jqreo6kBVHQg8Bbzrs/g14P9UtTcwBNgRqFhDTcvGsTxzyWC2Z+dzw5uLKS6xEX/rWvaBQsa/lkqDGGcYf5sbxISrQF6JDAHWqep6VS0ApgOjD7P+WOBNADfZRKnqpwCqmquqBwIYa8gZ1LEpD4zuy//W7uKxT1Z7HU5YKSouYcKb6WzOOsizlwymTUIDr0MyxjMSqHkrRGQMMFJVr3LvjwOOVtUJ5azbCfgOaK+qxSJyDnAVUAB0AT4D7lDV4jLbjQfGAyQlJSVPnz692vHm5uYSHx9f7e298sryfOZnFnH9wFiOal21fhKhWubqqq3yTv+xgI83FHJ53xiGdQjurrzh9hqDlbmqhg8fnqaqKdU+uKoG5A/4A/Ciz/1xwFMVrHu77zJgDJANdMXpQfYOcOXhjpecnKw18cUXX9Roe6/kFRbpOVMWaJ+7P9I123KqtG2olrm6aqO876ZnaKfbP9C731tW84DqQLi9xqpW5qoCUrUGn/WBrM7KBHwHe2oPbKlg3Qtxq7J8tl2sTlVYEfAeMDggUYa42KhInrk4mQYxUYyfmkZOXqHXIdVbSzOzuP2dZRzdpRl3n9Wn8g2MCQOBTCKLgO4i0kVEYnASxeyyK4lIT6Ap8G2ZbZuKSGmn+5OBlQGMNaS1Tojj3xcPJmPPAf48Ywkl1tBe63bsy2P8a2m0jI/l3zY3iDG/CNg7wb2CmADMBVYBM1V1hYg8ICKjfFYdC0x3L6tKty0GbgPmicgyQIAXAhVrfTDE/Xb82aodTP58rdfh1Cv5RcX8aVo6WQcLeP7SZJrHx3odkjFBI6C/WFfVOcCcMo/dU+b+fRVs+ynQP2DB1UOXHtuJHzKzePKztRzZLoFTeid5HVLIU1XufX8FaRv38vRFg+jbNsHrkIwJKnZNXo+ICA+deyT92jXh5hlL+HnXfq9DCnnTvtvI9EUZXD/8CM7q39brcIwJOpZE6pm46EievSSZqAhh/Gup7M8v8jqkkPXtT7u5/78rOaVXK249tafX4RgTlCyJ1EPtmzbk6YsG89POXP7y9g/4NDcZP2XsOcD1b6TTqXlDnrhwoM0NYkwFLInUU0O7teCO03sxZ9k2nv1yvdfhhJQDBUWMn5pGYXEJL1yaQhObG8SYClkSqceuPqErZ/Vvw//N/ZH/rd3pdTghQVX5y1tL+XFbDk+NHUTXluH1y2djqsqSSD0mIjw6pj/dWzXmhjcXk7HHhh+rzL/n/8SHy7Zy+8heDOvZyutwjAl6lkTquYbuKLMlJco1U9M4WFBc+UZhat6q7fzzk9WMGtCWa07s6nU4xoQESyJhoHOLRvzrwkGs2pbDne8utYb2cqzbsY+bpi+hb9sm/OO8/jY3iDF+siQSJob3asWfR/TgvSVb+M/XG9iRk8dD3x9kx748r0PzXPbBQq5+LY246AieG5dCgxibG8QYf1kSCSPXD+/GqX2S+PucVUyctYy1e0uYPG+d12F5qrhEufHNxWTuPcAzlyTTLtHmBjGmKiyJhJGICOHx8wfQLjGOT1ftQIHpCzcxfeEmFm/ay/acvLCbJfHRuT/y5Zqd3DeqL0d1buZ1OMaEnICOnWWCT+O4aAa0T2TTnoMAFJUod7y77JflURFCUpM42iTE0SaxgfM/IY42Ce7txDhaNIqtFz++e3/JZp77cj0XH92Ri4/u5HU4xoQkSyJhZkdOHp+s3H7IYzGRETxy3pEcKChma/ZBtmblsTU7j2WZWcxdkUdBUckh60dHOommbUID2iTG0TrBvV2abBLjaN4oJqgbp5dlZvPXt5cypHMz7j27r9fhGBOyLImEmcnz1lJSpneWoqRvymLSOf1+s76qsvdAIVuyDrI1O49t2QfZkp3HVvf+4k1ZbMvOo6D40EQTExlBa/cqpm1iAzfROEmmtftY04bRniSanfvyGT81leaNYvj3JYOJibJaXWOqy5JImEnflEVh8aFJpLBYSd+4t9z1RYRmjWJo1iiGfu3KHwa9pETZc6CArVl5bMk+yLZs5//WrDy2ZeexaMMetufk/ea4sVERh1y9lN5umxhH6ybO/4QGtZtoCopKuO71NPYeKODta4+jhc0NYkyNWBIJM3NuOuGX2/Pnz2fYsGE13mdEhNAiPpYW8bEc2b7iRLNrf75bVeZcxWzNzmNLlpN0vl+/h23lNOw3iI78pS2mNLH4ts+0SWhAk7ioShNNaZfmvtuXsmjDXiaPHVRhUjTG+M+SiKkTERFCq8ZxtGocx4AOieWuU1yi7MrN/6XqbKtPtdnW7IN889MutufkUbYDWcOYyF+rzZo4HQLaJsT9Um3WJiGOyfPWsmZvCWv2bubak45g1ACbG8SY2mBJxASNSLdnWFKTOAZVsE5RcQk7c/PZ4l7RbMvO++X21uw81mzfyY59+VT0o/wIgcuP6xyoIhgTdgKaRERkJPAvIBJ4UVUfKbP8CWC4e7ch0EpVE32WN8GZn32Wqk4IZKwmNERFRrjVWQ2ApuWuU1hcwo59+WzNcjoBvPL1zyzJyKJEnUT19Bfryu1EYIypuoB1SxGRSGAKcDrQBxgrIn1811HVW1R1oKoOBJ4C3i2zmweBLwMVo6mfoiMjaJfYgJTOzTimSzNWbMn5pQqssFh5OzXDhnsxppYEsm/jEGCdqq5X1QJgOjD6MOuPBd4svSMiyUAS8EkAYzT1XHldmotVw364F2NqSyCrs9oBGT73M4Gjy1tRRDoBXYDP3fsRwGPAOOCUig4gIuOB8QBJSUnMnz+/2sHm5ubWaPtQFA5l/mrlgXK7NH+1YhPzE3d5FFXdCYfXuCwrc90KZBIpr89lRQMzXQi8raqlk11cB8xR1YzDdd1U1eeB5wFSUlK0Jt1Va6u7aygJhzJ/NezX2+FQ3rKszOHByzIHMolkAh187rcHtlSw7oXA9T73jwVOEJHrgHggRkRyVfWOgERqjDGmWgKZRBYB3UWkC7AZJ1FcVHYlEemJ083m29LHVPVin+WXAymWQIwxJvgErGFdVYuACcBcnG66M1V1hYg8ICKjfFYdC0xXm27PGGNCTkB/J6Kqc4A5ZR67p8z9+yrZxyvAK7UcmjHGmFpgw5caY4ypNqkvtUgishPYWINdtADqf5/PQ4VbmcOtvGBlDhc1KXMnVW1Z3QPXmyRSUyKSqqopXsdRl8KtzOFWXrAyhwsvy2zVWcYYY6rNkogxxphqsyTyq+e9DsAD4VbmcCsvWJnDhWdltjYRY4wx1WZXIsYYY6rNkogxxphqC/skIiIjRWS1iKwTkXo/PpeIvCwiO0Rkudex1BUR6SAiX4jIKhFZISI3eR1ToIlInIgsFJEf3DLf73VMdUFEIkVksYh84HUsdUVENojIMhFZIiKpdX78cG4TcWdfXAOcijPq8CJgrKqu9DSwABKRE4Fc4DVVDYs5YkWkDdBGVdNFpDGQBpxTz19nARqpaq6IRAMLgJtU9TuPQwsoEfkzkAI0UdWzvI6nLojIBpxBaj35gWW4X4lUdfbFkKeqXwF7vI6jLqnqVlVNd2/vwxkQtJ23UQWWOnLdu9HuX73+xigi7YEzgRe9jiWchHsSKW/2xXr94RLuRKQzMAj43ttIAs+t2lkC7AA+VdX6XuYngb8CJV4HUscU+ERE0tzZXutUuCeRqsy+aEKciMQD7wA3q2qO1/EEmqoWq+pAnAnhhohIva2+FJGzgB2qmuZ1LB4YqqqDgdOB690q6zoT7kmkKrMvmhDmtgu8A7yuqu96HU9dUtUsYD4w0uNQAmkoMMptH5gOnCwi07wNqW6o6hb3/w5gFk41fZ0J9yTyy+yLIhKDM/vibI9jMrXMbWR+CVilqo97HU9dEJGWIpLo3m4AjAB+9DaqwFHVO1W1vap2xnkff66ql3gcVsCJSCO3swgi0gg4DajTnpdhnUQqmn3R26gCS0TexJmKuKeIZIrIlV7HVAeGAuNwvp0ucf/O8DqoAGsDfCEiS3G+LH2qqmHT7TWMJAELROQHYCHwoap+XJcBhHUXX2OMMTUT1lcixhhjasaSiDHGmGqzJGKMMabaLIkYY4ypNksixhhjqs2SiDEeEpFh4TTirKl/LIkYY4ypNksixvhBRC5x5+dYIiLPuYMb5orIYyKSLiLzRKSlu+5AEflORJaKyCwRaeo+3k1EPnPn+EgXkSPc3ceLyNsi8qOIvO7+wh4ReUREVrr7+adHRTfmsCyJGFMJEekNXIAz0N1AoBi4GGgEpLuD330J3Otu8hpwu6r2B5b5PP46MEVVBwDHAVvdxwcBNwN9gK7AUBFpBpwL9HX3MymwpTSmeiyJGFO5U4BkYJE7tPopOB/2JcAMd51pwPEikgAkquqX7uOvAie64xu1U9VZAKqap6oH3HUWqmqmqpYAS4DOQA6QB7woIr8HStc1JqhYEjGmcgK8qqoD3b+eqnpfOesdbgyh8qYdKJXvc7sYiHLHdRuCM/LwOUCdjodkjL8siRhTuXnAGBFpBSAizUSkE877Z4y7zkXAAlXNBvaKyAnu4+OAL935SzJF5Bx3H7Ei0rCiA7pznySo6hycqq6BgSiYMTUV5XUAxgQ7VV0pInfhzB4XARQC1wP7gb4ikgZk47SbAFwGPOsmifXAH93HxwHPicgD7j7+cJjDNgbeF5E4nKuYW2q5WMbUChvF15hqEpFcVY33Og5jvGTVWcYYY6rNrkSMMcZUm12JGGOMqTZLIsYYY6rNkogxxphqsyRijDGm2iyJGGOMqbb/B567YrYkiyMAAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# visualization \n",
    "# ******IMPORTANT******: DO NOT run this cell if you didn't run the cell (the second cell above) that train the BERT model.\n",
    "# To see the graph, you may also refer to the bert_accuracy.png in the original zipped folder\n",
    "plt.grid()\n",
    "plt.plot(history.history['accuracy'], marker = '^')\n",
    "plt.xlabel(\"epochs\")\n",
    "plt.ylabel(\"accuracy number\")\n",
    "plt.title(\"Accuracy on Training Set for each Epoch at Step Size = 25\")\n",
    "plt.savefig('bert_accuracy.png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now as we finished the training task, let's go ahead and apply our model to some new instances!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sentence_1 is a paraphrase of sentence_0\n",
      "sentence_2 is a paraphrase of sentence_0\n",
      "sentence_3 is not a paraphrase of sentence_0\n"
     ]
    }
   ],
   "source": [
    "# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task\n",
    "sentence_0 = \"This research was consistent with his findings.\"\n",
    "sentence_1 = \"His findings were compatible with this research.\"\n",
    "sentence_2 = \"His findings were not compatible with this research.\"\n",
    "sentence_3 = \"This is purely a trouble maker\"\n",
    "inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')\n",
    "inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')\n",
    "inputs_3 = tokenizer.encode_plus(sentence_0, sentence_3, add_special_tokens=True, return_tensors='pt')\n",
    "\n",
    "# use the model that we saved\n",
    "pred_1 = model(**inputs_1)[0].argmax().item()\n",
    "pred_2 = model(**inputs_2)[0].argmax().item()\n",
    "pred_3 = model(**inputs_3)[0].argmax().item()\n",
    "\n",
    "print(\"sentence_1 is\", \"a paraphrase\" if pred_1 else \"not a paraphrase\", \"of sentence_0\")\n",
    "print(\"sentence_2 is\", \"a paraphrase\" if pred_2 else \"not a paraphrase\", \"of sentence_0\")\n",
    "print(\"sentence_3 is\", \"a paraphrase\" if pred_3 else \"not a paraphrase\", \"of sentence_0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sentence_1 is not a paraphrase of sentence_0\n",
      "sentence_2 is a paraphrase of sentence_0\n"
     ]
    }
   ],
   "source": [
    "sentence_0 = \"Practical data science is applied in many fields.\"\n",
    "sentence_1 = \"I like cooking at home with some guidance from cookbook.\"\n",
    "sentence_2 = \"Applied Data Science skill is important in various industries.\"\n",
    "inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')\n",
    "inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')\n",
    "\n",
    "\n",
    "pred_1 = model(**inputs_1)[0].argmax().item()\n",
    "pred_2 = model(**inputs_2)[0].argmax().item()\n",
    "\n",
    "print(\"sentence_1 is\", \"a paraphrase\" if pred_1 else \"not a paraphrase\", \"of sentence_0\")\n",
    "print(\"sentence_2 is\", \"a paraphrase\" if pred_2 else \"not a paraphrase\", \"of sentence_0\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary\n",
    "Now we have some hands on experience on training BERT and some steps closer to it. It is a hugh breakthrough in the NLP community since it first released last year. The fact that it is applicable in more than 100 languages and approachable will allow more practical applications in the future. Also because of it is semantic in nature, BERT is a huge breakthrough of technologies in the semantic analysis, which is the pain-point for lots of NLP tasks. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Appendix\n",
    "![mlm](mlm.png)\n",
    "**Comparisom of accuracy on mased language model vs traditional left-to-right model *(Google, 2018)***\n",
    "\n",
    "![image.png](GLUE.png)\n",
    "\n",
    "**BERT performance comparison on GLUE dataset *(Google, 2018)***\n",
    "\n",
    "![squad](bert_implement.png)\n",
    "**BERT implementation example of pretraining and fine-tuning *(Google, 2018)***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Citations\n",
    "1. Google Research: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. October 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, from: https://arxiv.org/abs/1810.04805\n",
    "2. BERT Open-source: https://github.com/google-research/bert\n",
    "3. Huggingface Transformers: https://github.com/huggingface/transformers\n",
    "4. Wilson L Taylor. 1953. Cloze procedure: A new tool for measuring readability. Journalism Bulletin, 30(4):415–433.\n",
    "5. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Pro- cessing Systems, pages 6000–6010.\n",
    "6. Tensorflow: https://www.tensorflow.org/\n",
    "7. PyTorch: https://pytorch.org/\n",
    "8. Tensorflow dataset: https://medium.com/tensorflow/introducing-tensorflow-datasets-c7f01f7e19f3\n",
    "9. GLUE/MRPC dataset: https://www.microsoft.com/en-us/download/details.aspx?id=52398"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# END of this Tutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
